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PROCESS FOR SELECTING ANTI-SENSE OLIGONUCLEOTIDES 

Field of the Invention 

This invention relates to a means of selecting nucleotide sequences having a 
5 selected value for free energy variance. The methods allow for selecting anti-sense 
oligonucleotides, for use e.g., as pharmacological agents, from a set of candidates, e.g., 
those provided by a given nucleic acid, e.g., an mRNA. 

Background of the Invention 

1 0 Antisense therapy involves the administration of exogenous oligonucleotides 

that bind to a target nucleic acid, typically an RNA molecule. located within cells. The 
term antisense is so given because the oligonucleotides are typically complementary to 
mRNA molecules ("sense strands") which encode a cellular product. The ability to use 
anti-sense oligonucleotides to inhibit expression of mRNAs. and thereby to inhibit 

15 protein expression in vivo, is well documented. However, selection of an appropriate 
complimentary oligonucleotide (or oligonucleotides) to a given mRNA is not always 
simple (see, e.g., Crooke, S.T. FASEBJ. 7: 533-539 (1993), incorporated herein by 
reference). Anti-sense agents typically need to continuously bind all target RNA 
molecules so as to inactivate them or alternatively provide a substrate for endogenous 

20 ribonuclease H (Rnase H) activity. Sensitivity of RNA/oligonucleotide complexes, 
generated by the methods of the present invention, to Rnase H digestion can be 
evaluated by standard methods (see. e.g.. Donia. B. P., et ah. J. Biol. Chem. 268 
(19):14514-14522 (1993); Kawasaki. A. M.,etal.,7. Med. Chem. 6(7):83 1-841 (1993), 
incorporated herein by reference). 

25 

Summary of the Invention 

Prior art methods do not provide efficient means of determining which 
complimentary oligonucleotides to a given mRNA will be useful in an application. 
Shorter (1 5-200) base anti-sense molecules are preferred in clinical applications. In fact, 

30 a minimum of 15 base anti-sense oligonucleotides is preferred. The invention includes 
methods for selecting desired anti-sense oligonucleotides from the set of candidates 
provided by any given nucleic acid ? e.g., an mRNA. In particular, the invention 
provides a means of determining desired, e.g., sequence positions, e.g., those which 
present a desired level of free energy variations on the mRNA to design anti-sense 

35 oligonucleotides against thus reducing the empiricism currently employed. 
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In one aspect, the invention features a method of identifying a site on a nucleic 
acid sequence having high free energy variability. This allows determination of sites 
which are preferred for oligonucleotide, e.g. anitsense. binding. The method includes 
some or all of the following steps: 
5 providing a nucleotide sequence, e.g., sequence from a target gene; 

casting the nucleotide sequence as the free energy as a function of base pair 
position; 

calculating the free energy of X windows centered on a base pair for a plurality 
of base pairs from the nucleotide sequence for every, or at least a plurality of window 
10 sizes between 2 and Y, where Y is an integer between 3 and 1.000. more preferably 
between 2 and 1 00; 

for each window size, constructing a free energy distribution along the sequence, 
preferably normalizing the distribution to a standard scale (to account for the fact that 
the free energy is proportional to window size) (this calculation gives the results which 
1 5 can be plotted as shown in Figure 1 ); 

finding the mean normalized free energy values for all windows for each base 
pair position(this gives results which can be plotted as in Figure 2. It also represents the 
"carrier"); 

subtracting the mean value for a position and provide the deviation from the 
20 mean of each base position to determine those sequence which show high variability. 
The results can be plotted as in Figure 3 (point "a" in Figs 2 and 3 corresponds to high 
variability). 

In certain embodiments, free energy values are calculated for a plurality of 
window sizes at at least Z percent of the base of the nucleotide sequences, wherein Z is 
25 at least 5. 10, 20, 30, 40 , 50, 60, 70. 80, or 90% of the base pairs of the nucleotide 
sequence. 

In another embodiment, the invention provides a method of identifying an 
optimized ligand binding site on a nucleic acid sequence. The method includes the steps 
of providing a nucleic acid sequence: calculating a free energy value for at least two 

30 window sizes at each of a plurality of base pairs of the nucleic acid sequence; 

normalizing the free energy values for each window size at each base pair to a standard 
scale; and calculating a deviation of each normalized free energy value at a base pair 
from a mean normalized free energy value at the base pair; and selecting a base pair at 
which a large deviation from the normalized free energy value is calculated, relative to 

35 at least one other base pair; such that an optimized ligand binding site on the nucleic 
acid sequence is identified. 
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In certain embodiments, free energy values are calculated for a plurality of 
window sizes at at least Z percent of the base of the nucleotide sequences, wherein Z is 
at least 5. 10. 2Q. 30. 40 . 50. 60. 70, 80, or 90% of the base pairs of the nucleotide 
sequence. 

5 In certain embodiments, free energy values are calculated for at least N window 

sizes at each of the plurality of base pairs, wherein N is at least 2. 5, 10. 15, 20, 30. 40, 
or 50 window sizes. 

In another aspect, the invention provides for a method for determining preferred 
anti-sense sequence compliments within a predefined RNA sequence; these are generally 
1 0 high variability sequences.(As used herein high variability can be a relative parameter, 
e.g., relative to other variability in the sequence. Alternatively it can be relative to a 
predefined value). 

In another aspect, the invention provides for sets (e.g., sets of 2, 3, 4 or more) of 
sequences, e.g., anti-sense oligonucleotides, of an optimal duplex free energy or 
1 5 variability but variable length at the sites of anti-sense candidates within candidate 
regions. 

In another aspect, the invention provides sets of isoenergetic, or isovariablc, 
oligonucleotides, e.g., anti-sense candidates of a set length within a candidate region. 
In yet another aspect, the invention provides for establishing oligonucleotides, 
20 (e.g., sets of 2. 3, 4 or more) oligonucleotides, e.g., anti-sense oligonucleotides ,of a 
preselected melting temperature, Tm within candidate regions. 

Generally, the method allows for identification, choosing, and matching of 
sequences with desired free energy variability characteristics. 

Methods of the invention can be used for any of the following: 
25 Determining the best anti-sense candidate regions, or sub-sequences, within any 

given anti-sense target. Such sequences exhibit wide variation in average energy as a 
function of increasing length. 

Designing desirable attributes such as Tm, free energy and length coupled with 
sequence composition to arrive at the best anti-sense oligonucleotide candidates 10-200 
30 bases in length including the pre-identified candidate regions. 

Providing compositions of sequence which display the identified variation in 
sequence composition with changing (e.g., increasing) window size. 

Any method of the invention can include providing a sequence, e.g., by 
synthesizing (by chemical or biochemical methods), or by placing in a reaction mixture 
35 which includes a carrier, e.g., a liquid, e.g., water. 
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.4. 

Brief Description of the Drawings 

Figure 1 is a plot of normalized energy as a function of window size and position 
along a representative DNA sequence. 

5 Figure 2 is an overlaid plot of the data shown in Figure I . 

Figure 3 is a plot of the variability of energy distributions along the 
representative DNA sequence. 

10 Detailed Description of the Invention 

The invention provides a method(s) for determining base position(s) on a 
preselected mRNA sequence where best hybridization of an oligonucleotide will occur. 
Note that the mRNA may be a pre-mRNA (hnRNA) thus containing untranscribed 
regions to be spliced out and that included in this mRNA/pre-mRNA are a variety of 

1 5 control sequences which allow binding of various cellular components. 

For example, if one were to approach the problem of anti-sense design randomly 
on, for example, a 1000 base target mRNA molecule, then one could pick a set length 
oligonucleotide, e.g., 30 bases, and synthesize a thirty-mer starting at position 1 of the 

20 mRNA and complimentary to positions 1 -30 of the mRNA. followed by synthesis of a 
second thirty-mer starting at position 2 and ending at position 3 1 . This iterative process 
of synthesis followed to its conclusion results in [1000 base mRNA -2(30 base anti- 
sense length) + 2] = 942 thirty base anti-sense oligonucleotides. Similarly, of course one 
might also select nineteen-mers as the optimal length resulting in [1000 base mRNA - 

25 2(19 base anti-sense length) + 2] = 964 nineteen-base anti-sense oligonucleotides. 

In fact, one could synthesize all such complimentary oligonucleotides of length 
less than the mRNA length and try to inhibit protein synthesis with each in an attempt to 
find the best anti-sense oligonucleotide for a given mRNA. However, in practice this 
30 approach would be an enormous undertaking. Clearly the process of selecting an anti- 
sense oligonucleotide of length suitable for large scale use as a pharmaceutical while 
showing in vivo activity would be simplified by identifying the "best" mRNA sequence 
position to target an anti-sense oligonucleotide against. 
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Thc method is described below with reference to data from a representative 
target nucleic acid sequence (LDH M72545. base positions from 64-924; the sequence is 
available through GENBANK). 

5 The algorithm for determining relatively "reactive" sites along genomic DNA is 

based on a representation of duplex DNA in terms of its sequence dependent melting 
free-energy. This provides DNA sequence as energy contours, that when scrutinized in 
the proper way, can lead to direct determination of specific sites that are optimum for 
targeting by anti-sense therapeutic agents. 

10 

There are six steps to the current method with at least one step (4) being 
considered optional: 

m Free-Energy Representation of DNA Sequences: For a DNA sequence comprised of 
15 N base pairs (bps), each bp i can be assigned a melting free-energy value, AG,, 

AGj - AG H " B ; + (AG S j j.j + AGS . J+|)/2 

Where AG H * B j is the free-energy of hydrogen bonding that typically can take on only 
20 two values (for A-T or G-C type bps) and AG S > M + AG S , i+ , are the nearest-neighbor 
sequence dependent stacking free-energies for the stacking interactions between bp i and 
bps i+ 1 and i- 1 . Utilizing this equation each bp can be assigned a free-energy of 
melting. 

25 (2} Construction of Free-Energy Windows: In this procedure, windows of bps 

containing from 2 to 200 bps are individually examined. For each window size, starting 
at bp 1, the added free-energy of the bps in the window are summed and plotted as the 
first point. The window is then moved over one bp and the free-energy of the new 
window that contains the free-energy of a new bp and not the free-energy of the first bp 

30 of the previous window and all the intervening bps, is determined. The process is 
continued until the last window reaches the end of the DNA sequence under 
consideration. Formally for each window size, j = 10-42 bps starting at bp s = 1, N-j+1 
the free-energy of each window is given by, 

35 AG j - = I i=sj+ , l (AG i ) 
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Thus, plotting the values of AGj w vs bp position s results in an energy contour for that 
particular window size, j. Since the magnitude of AGj vv increase with the size of j. 
relative features of energy contours constructed for different window sizes are difficult 
to compare directly. 

5 

(3) Direct Comparisons of Energy Contours Constructed with Different Window Sizes: 
To facilitate such a direct comparison the values of AG> determined for different values 
of j are normalized relative to the maximum free-energy difference of any two windows 
of size j. Thus the normalized free-energy for each window is given by 

10 

<AGj vv ) = | (AGj w - AGj w (min)) |/| (AGj w (max) - AGj w (min)) I 

Where AGj w ( max) and AGj w (min) are the maximum minimum and free-energies observed 

for all the windows along the sequence of size j. Now the free-energy contours 
1 5 constructed with different window sizes consist of a distribution of relative free-energies 

with values between 0 and 1 vs bp position. 

Figure 1 is a plot of normalized energy as a function of window size and position 

along the representative DNA sequence (LDH M72545. base positions from 64-924). 

The window size was varied over a range from 1 0 to 42 for each position, and the energy 
20 profile for each base position and window size was plotted. 

(4) f Optional stevl Overlapping Energy Contours Constructed with Different Window 
Sizes: A more direct comparison of these energy contours is to "overplot" (e.g., plot 
one data set over another) them as shown in Fig. 2. Features of the distribution of 

25 melting stability are clearly apparent and apparently only slightly dependent on window 
size over the range examined. Regions of lowest magnitude are the least stable while 
regions of highest magnitude are the most stable. Although the same general features 
are observed on all the distribution function shown in Figure 2, there are small 
deviations (on the order of 10-20%) about what appears to be the "average" shape of the 

30 distribution. These distribution directly reveal the contributions of hydrogen bonding 
and nearest-neighbor stacking to DNA stability. The prominent features of the 
distribution are generally determined by the amount of A-T or G-C type bps in the 
sequence. For example, the peaks in the overlaid plots of Fig. 2 depict regions relatively 
higher in G-C percentage. The converse is true for the "valleys," which reveal a larger 

35 percentage of A-T type base pairs in that region. Because of the greater relative energy 
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of the peaks as compared to the valleys, the effect of window size is more pronounced at 
the peaks. 

(5) Deviations from the Window Size Average Reveal Targetable Regions: The 

5 superimposed "noise" or deviations from the mean behavior of the distribution for the 
different window sizes seen in Fig. 2 reveals the influence of nearest-neighbor stacking 
on DN A stability. It is this noise pattern that can be isolated. To better examine this 
component of the distribution functions, the average over all normalized energies 
determined for each window size are determined at each bp position, s. That is. 

10 

<AG-) avc (s) = I,AG i -)( S yN w 

where N w is the number of window sizes. Now the differences. 

1 5 5<AG"> ave (s) = <AG"> avc (s) - <AGj*>(s) 

are determined and plotted vs sequence position for each window size as shown in Fig. 
3. The result is a "noise" pattern with most values between -0.20 and +0.20 centering 
around 0 along the bp position. Notably, several regions emerge from this pattern which 
20 display larger range than the preselected noise criteria. These regions are clearly seen in 
Fig. 3 (e.g., the point labelled "A" in Figure 3 has large variability) and display the 
highest variability in sequence dependent stability with changes in window size and after 
scaling the values for the entire sequence set as described above. These are the desired 
targets for sequence specific anti-sense therapeutics. 

25 

(6) Selection of Sequences: The 200 base sequences (other lengths, e.g., 150, 100, or 50 
bases, can also be used), 1 00 to either side of the "variational maxima" seen on the plots 
of 6<AG w ) avc (s) vs s (5(AG w ) ave (s)) are identified from the mRNA sequence and subjected 
to further examination. While these 200-mers could be used as anti-sense 

30 oligonucleotides immediately it is more desirable to use smaller oligomers comprising, 
e.g., approximately 50, 40, 30 or fewer bases that are subsequences of the selected 200- 
mer. Optimal anti-sense candidate oligomers within the 200-mer will contain a 2-10 bp 
more stable region flanked by relatively unstable regions. 

35 In some applications it may be desirable to select sets of anti-sense 

oligonucleotides all with a pre-defined optimal duplex free-energy but with different 
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variable lengths. This is done by scanning the energetic distribution of the 200 bp 
region and determining the various pieces from 15 to 30 bps in length that have the same 
calculated free-energy of stability. 

5 In other applications it may be desirable to select sets of isoenergetic anti-sense 

candidates of a given length. This is done by scanning the energetic distribution of the 
200 bp region and determining the various pieces of a given length that have the same 
calculated free-energy. 

1 0 In other applications it may be desirable to choose anti-sense oligonucleotides of 

a preselected melting temperature, T m . This can be done using the formula, 

T m = (AH D + AH nuc y)AS D + AS niie + ln(aC T ) 

1 5 Where AH D and AS D are the calculated melting enthalpy and entropy for the particular 
sequence. 

The entropy of nucleation is AS nue and is regarded as a constant for a particular 
type of target in our equational formulation. That is. it does not depend on oligomer 
20 length. In contrast, the enthalpy of duplex nucleation. AH nue is primarily electrostatic 
in nature and therefore depends on sequence length, G-C percentage and salt 
concentration. The total strand concentration is Cj and a is a factor that properly 
accounts for sequence degeneracies in association of the oligomers. Overall, stability of 
the chose oligomers can therefore be adjusted by changes in G-C percentage and length. 

25 

While the invention has been described with reference to selection of sequences 
which are suitable targets for the design of antisense oligonucleotides, it will be 
appreciated that the methods described herein can be used to identify regions of a target 
nucleic acid sequence (including, but not limited to, a coding or non-coding DNA or 

30 RNA) which are suitable for interaction with other ligands which can bind to the nucleic 
acid, including one or more of: a compound which binds to a nucleic acid in a sequence- 
specific way (e.g., a sequence specific cleavage enzyme, such as a restriction 
endonuclease, including EcoRI. HaeNI, BamHI and Bgll, or an enzyme or other 
molecule which binds to a specific sequence, e.g., molecules which modulate the 

35 expression of a product encoded by a nucleic acid) or in a sequence-non-specific way 
(e.g., DNasel or micrococcal nuclease); a protein; an enzyme; an enzyme or other 
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molecule (and agonists or antagonists thereof) which alters the structure of a nucleic acid 
to which is binds, e.g., by breaking or forming a covalent or non-covalent bond, e.g., a 
hydrogen bond, between an atom of the nucleic acid and another atom. e.g.. an atom of 
the same strand, an atom of the complementary sequence, or an atom of another 
5 molecule; an enzyme which cleaves one or both strands of the nucleic acid, and agonists 
or antagonists thereof; an enzyme which methylates or alkylates the nucleic acid, and 
agonists or antagonists thereof; an enzyme which promotes or catalyzes the synthesis of 
a nucleic acid, e.g., a polymerase which requires a double stranded prime, and agonists 
or antagonists thereof; a DNA polymerase, e.g.. DNA polymerase I or Taq polymerase. 

10 and agonists or antagonists thereof: an enzyme which alters the primary or secondary 
structure of a nucleic acid, e.g., a topoisomerase. or an enzyme related to recombination 
or replication, and agonists or antagonists thereof; a DNA binding ligand. and agonists 
or antagonists thereof; a mutagen; a compound which enhances gene expression, and 
agonists or antagonists thereof; a compound which intercalates into a double stranded 

1 5 nucleic acid, and agonists or antagonists thereof; a compound which, when contacted 
with a reaction mixture comprising a first single stranded nucleic acid and a second 
single stranded nucleic acid will accelerate the rate of duplex formation at least n-fold, 
wherein n is an integer between 2 and 1 ,000, inclusive; a compound which will decrease 
the free energy of duplex formation by n-fold. wherein n is an integer between 1 and 

20 1,000 inclusive; a small molecule, e.g., any metalloorganic compound, any heterocyclic 
compound, or any protein which binds a nucleic acid; proteins or other molecules which 
are associated with the structural organization of DNA in the cell nucleus, or the 
packaging of DNA. including histones and nucleosomes; nucleic acid binding mutagens 
or carcinogens, or agonists or antagonists thereof; viral proteins and agonists or 

25 antagonists thereof. Thus, the methods of the invention have broad applicability. 

Those skilled in the art will recognize, or be able to ascertain using no more than 
routine experimentation, numerous equivalents to the specific procedures described 
herein. Such equivalents are considered to be within the scope of this invention and are 
30 covered by the following claims. 

The contents of all references and patent applications cited herein are hereby 
incorporated by reference. 

Other embodiments are within the following claims. 
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What is claimed is: 

1 . A method of identifying a site on a nucleic acid sequence having a desired free 
energy variability comprising: 

5 providing a nucleotide sequence of length Z; 

calculating the free energy of a plurality of windows centered on a base pair for a 
plurality of base pairs from the nucleotide sequence, wherein the number of window 
sizes is between 2 and Y» wherein Y is an integer between 3 and 100; 

for each window size, constructing a free energy distribution along the sequence, 
1 0 normalizing the distribution to a standard scale; 

determining the mean normalized free energy values for all windows for each 
base pair position; 

subtracting the mean value for a position and providing the deviation from the 
mean of each base position to determine a site having the desired free energy variability. 

15 

2. The method of claim 1, wherein the plurality of base pairs comprises at least 
50% of the base pairs of the nucleotide sequence. 

3. A method of identifying an optimized ligand binding site on a nucleic acid 
20 sequence, the method comprising: 

providing a nucleic acid sequence; 

calculating a free energy value for at least two window sizes at each of a plurality 
of base pairs of the nucleic acid sequence; 

normalizing the free energy values for each window size at each base pair to a 
25 standard scale; and 

calculating a deviation of each normalized free energy value at a base pair from a 
mean normalized free energy value at the base pair; and 

selecting a base pair at which a large deviation from the normalized free energy 
value is calculated, relative to at least one other base pair; 
30 such that an optimized ligand binding site on the nucleic acid sequence is 

identified. 
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